36 research outputs found
Generalized Approximate Survey Propagation for High-Dimensional Estimation
In Generalized Linear Estimation (GLE) problems, we seek to estimate a signal
that is observed through a linear transform followed by a component-wise,
possibly nonlinear and noisy, channel. In the Bayesian optimal setting,
Generalized Approximate Message Passing (GAMP) is known to achieve optimal
performance for GLE. However, its performance can significantly degrade
whenever there is a mismatch between the assumed and the true generative model,
a situation frequently encountered in practice. In this paper, we propose a new
algorithm, named Generalized Approximate Survey Propagation (GASP), for solving
GLE in the presence of prior or model mis-specifications. As a prototypical
example, we consider the phase retrieval problem, where we show that GASP
outperforms the corresponding GAMP, reducing the reconstruction threshold and,
for certain choices of its parameters, approaching Bayesian optimal
performance. Furthermore, we present a set of State Evolution equations that
exactly characterize the dynamics of GASP in the high-dimensional limit
Out of equilibrium Statistical Physics of learning
In the study of hard optimization problems, it is often unfeasible to achieve
a full analytic control on the dynamics of the algorithmic processes that
find solutions efficiently. In many cases, a static approach is able to provide
considerable insight into the dynamical properties of these algorithms: in fact,
the geometrical structures found in the energetic landscape can strongly affect
the stationary states and the optimal configurations reached by the solvers.
In this context, a classical Statistical Mechanics approach, relying on the
assumption of the asymptotic realization of a Boltzmann Gibbs equilibrium,
can yield misleading predictions when the studied algorithms comprise some
stochastic components that effectively drive these processes out of equilibrium.
Thus, it becomes necessary to develop some intuition on the relevant features
of the studied phenomena and to build an ad hoc Large Deviation analysis,
providing a more targeted and richer description of the geometrical properties
of the landscape. The present thesis focuses on the study of learning processes
in Artificial Neural Networks, with the aim of introducing an out of equilibrium
statistical physics framework, based on the introduction of a local entropy
potential, for supporting and inspiring algorithmic improvements in the field
of Deep Learning, and for developing models of neural computation that can
carry both biological and engineering interest
Solvable Model for Inheriting the Regularization through Knowledge Distillation
In recent years the empirical success of transfer learning with neural
networks has stimulated an increasing interest in obtaining a theoretical
understanding of its core properties. Knowledge distillation where a smaller
neural network is trained using the outputs of a larger neural network is a
particularly interesting case of transfer learning. In the present work, we
introduce a statistical physics framework that allows an analytic
characterization of the properties of knowledge distillation (KD) in shallow
neural networks. Focusing the analysis on a solvable model that exhibits a
non-trivial generalization gap, we investigate the effectiveness of KD. We are
able to show that, through KD, the regularization properties of the larger
teacher model can be inherited by the smaller student and that the yielded
generalization performance is closely linked to and limited by the optimality
of the teacher. Finally, we analyze the double descent phenomenology that can
arise in the considered KD setting
Large deviations for the perceptron model and consequences for active learning
Active learning is a branch of machine learning that deals with problems
where unlabeled data is abundant yet obtaining labels is expensive. The
learning algorithm has the possibility of querying a limited number of samples
to obtain the corresponding labels, subsequently used for supervised learning.
In this work, we consider the task of choosing the subset of samples to be
labeled from a fixed finite pool of samples. We assume the pool of samples to
be a random matrix and the ground truth labels to be generated by a
single-layer teacher random neural network. We employ replica methods to
analyze the large deviations for the accuracy achieved after supervised
learning on a subset of the original pool. These large deviations then provide
optimal achievable performance boundaries for any active learning algorithm. We
show that the optimal learning performance can be efficiently approached by
simple message-passing active learning algorithms. We also provide a comparison
with the performance of some other popular active learning strategies.Comment: 25 pages, 7 figure
Large deviations in the perceptron model and consequences for active learning
Active learning (AL) is a branch of machine learning that deals with problems where unlabeled
data is abundant yet obtaining labels is expensive. The learning algorithm has the possibility of
querying a limited number of samples to obtain the corresponding labels, subsequently used for
supervised learning. In this work, we consider the task of choosing the subset of samples to be
labeled from a fixed finite pool of samples. We assume the pool of samples to be a random matrix
and the ground truth labels to be generated by a single-layer teacher random neural network. We
employ replica methods to analyze the large deviations for the accuracy achieved after supervised
learning on a subset of the original pool. These large deviations then provide optimal achievable
performance boundaries for any AL algorithm. We show that the optimal learning performance
can be efficiently approached by simple message-passing AL algorithms. We also provide a
comparison with the performance of some other popular active learning strategies
On the role of synaptic stochasticity in training low-precision neural networks
Stochasticity and limited precision of synaptic weights in neural network
models are key aspects of both biological and hardware modeling of learning
processes. Here we show that a neural network model with stochastic binary
weights naturally gives prominence to exponentially rare dense regions of
solutions with a number of desirable properties such as robustness and good
generalization performance, while typical solutions are isolated and hard to
find. Binary solutions of the standard perceptron problem are obtained from a
simple gradient descent procedure on a set of real values parametrizing a
probability distribution over the binary synapses. Both analytical and
numerical results are presented. An algorithmic extension aimed at training
discrete deep neural networks is also investigated.Comment: 7 pages + 14 pages of supplementary materia
Learning may need only a few bits of synaptic precision
Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic
states. The choice of discrete synapses is motivated by biological reasoning and experiments, and possibly by
hardware implementation considerations as well. In this paper we extend a previous large deviations analysis
which unveiled the existence of peculiar dense regions in the space of synaptic states which accounts for the
possibility of learning efficiently in networks with binary synapses. We extend the analysis to synapses with
multiple states and generally more plausible biological features. The results clearly indicate that the overall
qualitative picture is unchanged with respect to the binary case, and very robust to variation of the details of
the model. We also provide quantitative results which suggest that the advantages of increasing the synaptic
precision (i.e., the number of internal synaptic states) rapidly vanish after the first few bits, and therefore that,
for practical applications, only few bits may be needed for near-optimal performance, consistent with recent
biological findings. Finally, we demonstrate how the theoretical analysis can be exploited to design efficient
algorithmic search strategies
The star-shaped space of solutions of the spherical negative perceptron
Empirical studies on the landscape of neural networks have shown that
low-energy configurations are often found in complex connected structures,
where zero-energy paths between pairs of distant solutions can be constructed.
Here we consider the spherical negative perceptron, a prototypical non-convex
neural network model framed as a continuous constraint satisfaction problem. We
introduce a general analytical method for computing energy barriers in the
simplex with vertex configurations sampled from the equilibrium. We find that
in the over-parameterized regime the solution manifold displays simple
connectivity properties. There exists a large geodesically convex component
that is attractive for a wide range of optimization dynamics. Inside this
region we identify a subset of atypical high-margin solutions that are
geodesically connected with most other solutions, giving rise to a star-shaped
geometry. We analytically characterize the organization of the connected space
of solutions and show numerical evidence of a transition, at larger constraint
densities, where the aforementioned simple geodesic connectivity breaks down.Comment: 27 pages, 16 figures, comments are welcom